LIPT: A Lossless Text Transform to Improve Compression
نویسندگان
چکیده
We propose an approach to develop a dictionary based reversible lossless text transformation, called LIPT (Length Index Preserving Transform), which can be applied to a source text to improve existing algorithm’s ability to compress. In LIPT, the length of the input word and the offset of the words in the dictionary are denoted with alphabets. Our encoding scheme makes use of recurrence of same length of words in the English Language to create context in the transformed text that the entropy coders can exploit. LIPT achieves some compression at the preprocessing stage as well and retains enough context and redundancy for the compression algorithms to give better results. Bzip2 with LIPT gives 5.24% improvement in average BPC over Bzip2 without LIPT, and PPMD with LIPT gives 4.46% improvement in average BPC over PPMD without LIPT, for our test corpus.
منابع مشابه
LIPT: A Reversible Lossless Text Transform to Improve Compression Performance
Lossless compression researchers have developed highly sophisticated approaches, such as Huffman encoding, arithmetic encoding, the Lempel-Ziv family, Dynamic Markov Compression (DMC), Prediction by Partial Matching (PPM), and Burrows-Wheeler Transform (BWT) based algorithms. We propose an alternative approach in this paper to develop a reversible transformation that can be applied to a source ...
متن کاملTransform Methods Used in Lossless Compression of Text Files
This paper presents a study of transform methods used in lossless text compression in order to preprocess the text by exploiting the inner redundancy of the source file. The transform methods are Burrows-Wheeler Transform (BWT, also known as Block Sorting), Star Transform and LengthIndex Preserving Transform (LIPT). BWT converts the original blocks of data into a format that is extremely well s...
متن کاملAnalysis of Lossless Reversible Transformation Algorithms to Enhance Data Compression
In this paper we analyze and present the benefits offered in the lossless compression by applying a choice of preprocessing methods that exploits the advantage of redundancy of the source file. Textual data holds a number of properties that can be taken into account in order to improve compression. Pre-processing cope up with these properties by applying a number of transformations that make th...
متن کاملDictionary-Based Fast Transform for Text Compression
In this paper we present StarNT, a dictionary-based fast lossless text transform algorithm. With a static generic dictionary, StarNT achieves a superior compression ratio than almost all the other recent efforts based on BWT and PPM. This algorithm utilizes ternary search tree to expedite transform encoding. Experimental results show that the average compression time has improved by orders of m...
متن کاملLossless and nearly-lossless image compression based on combinatorial transforms. (Compression d'images sans perte ou quasi sans perte basée sur des transformées combinatoires)
Common image compression standards are usually based on frequency transform such as Discrete Cosine Transform or Wavelets. We present a different approach for lossless image compression, it is based on combinatorial transform. The main transform is Burrows Wheeler Transform (BWT) which tends to reorder symbols according to their following context. It becomes a promising compression approach bas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001